{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "This is an example of a Python notebook used to train a model (decision tree) and then export it as PMML so that it can be easily integrated with additional decision logic (DMN https://drools.org/learn/dmn.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use case: credit card dispute risk\n",
    "In this example end user has filed a credit card dispute and we want to predict the risk related to the disputed transaction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prerequisite\n",
    "To run this notebook you need to have:\n",
    "- Python 3 https://www.python.org/downloads/\n",
    "- Pip https://pypi.org/project/pip/\n",
    "- jupyter https://jupyter.org/ (`pip install jupyterlab`)\n",
    "\n",
    "Install dependencies:\n",
    "- pandas\n",
    "- scikit-learn\n",
    "- numpy\n",
    "- nyoka\n",
    "- matplotlib\n",
    "\n",
    "All of them can be installed cloning this repo and using command `pip install -r ./binder/requirements.txt`\n",
    "\n",
    "Finally start the environment using the command `jupyter notebook`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.pipeline import Pipeline\n",
    "from nyoka import skl_to_pmml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 1 - Load and prepare data\n",
    "Data are usually available as unstructure files with text (csv) or binary (parquet, avro) format and there are many libraries to load them from a local or remote storage.\n",
    "\n",
    "After the loading step it is quite common to perform some preparation/cleanup actions: check domain boundaries, handle missing values, normalize strings, convert enumaration to number, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>holder_index</th>\n",
       "      <th>amount</th>\n",
       "      <th>dispute_risk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>246.35</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>10.00</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>113.05</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>159.58</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>197.64</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   holder_index  amount  dispute_risk\n",
       "0             0  246.35             5\n",
       "1             0   10.00             2\n",
       "2             1  113.05             2\n",
       "3             2  159.58             3\n",
       "4             0  197.64             4"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('input_data.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Step 2 - Prepare training set and test set\n",
    "When a model is trained, it is important to use different set of data to train and test it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "test_size = 0.4             # percentage of data used to create test set\n",
    "random_state = 23           # fixed seed to make randomization reproducible\n",
    "inputs = df[['amount', 'holder_index']]\n",
    "outputs = df['dispute_risk']\n",
    "\n",
    "input_train, input_test, output_train, output_test = train_test_split(inputs, outputs, test_size=test_size, random_state=random_state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "\n",
    "# Step 3 - Train the model\n",
    "There are many different models that can be used. In this example we will use a decision tree classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "pipeline = Pipeline([\n",
    "    (\"classifier\", DecisionTreeClassifier())\n",
    "])\n",
    "trained_model = pipeline.fit(input_train, output_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 4 - Test the mode"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are multiple way to test the model, first of all you should test it using test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model_score: 0.975\n"
     ]
    }
   ],
   "source": [
    "model_score = trained_model.score(input_test, output_test)\n",
    "print(\"model_score: \" + str(model_score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: Pay attention to overfitting problem ( https://en.wikipedia.org/wiki/Overfitting )  while you train and test your model. For example a score of 0.99 or similar is an important sign of a probable overfit."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally you can print to visually compare predicted data with real data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dtree prediction</th>\n",
       "      <th>truth</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>519</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>837</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>208</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>525</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>978</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>583</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>508</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>589</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>201</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>280</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>974</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>203</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>790</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>428</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>998</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>381</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     dtree prediction  truth\n",
       "519                 2      2\n",
       "837                 2      2\n",
       "208                 2      2\n",
       "525                 2      2\n",
       "978                 2      2\n",
       "583                 2      2\n",
       "508                 3      3\n",
       "158                 3      3\n",
       "589                 3      3\n",
       "201                 2      2\n",
       "280                 1      1\n",
       "167                 1      1\n",
       "974                 2      2\n",
       "95                  3      3\n",
       "236                 1      1\n",
       "203                 2      2\n",
       "790                 5      5\n",
       "428                 2      2\n",
       "998                 2      2\n",
       "381                 4      4"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictions = trained_model.predict(input_test)\n",
    "results = pd.DataFrame({'dtree prediction': predictions.astype(int),\n",
    "                        'truth': output_test})\n",
    "results.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another approach is to plot predicted data and real data on the same chart"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiAAAAFlCAYAAADS9FNeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAneklEQVR4nO3dfZRcdZ3n8c+3q5/S3bUq3TnIELoruo4YArZJkwPrEDMkgoIHnRFnYXJmZY6c1m5w8TjDjpg5+LQ5467uPMHwkHE4caZbBwVHHRZmxTEccBA0ZALyIIiQhCBMQnjqSueBdL77R91qqjv1eKu6btWt9+uce6rq3t/93W//Ukk+fe+v6pq7CwAAoJ7aoi4AAAC0HgIIAACoOwIIAACoOwIIAACoOwIIAACoOwIIAACou/aoDjwwMOCpVCqqwwMAgAX2wAMPvODui/NtiyyApFIpbd26NarDAwCABWZmOwtt4xIMAACoOwIIAACoOwIIAACou8jmgOTz2muvaffu3Tp48GDUpTS17u5uLVmyRB0dHVGXAgBAXg0VQHbv3q1kMqlUKiUzi7qcpuTu2rdvn3bv3q2lS5dGXQ4AAHk11CWYgwcPqr+/n/BRBTNTf38/Z5EAAA2toQKIJMJHDTCGAIBG13ABJGqJRELDw8Navny5PvKRj2h6ejp0X5dccoluueUWSdKll16qRx99tGDbu+66S/fee2/Fx0ilUnrhhRdC1wgAQBQIIPMsWrRI27dv18MPP6zOzk7dcMMNc7YfOXIkVL9f+9rXtGzZsoLbwwYQAACaUdkBxMwSZvbvZnZbnm1dZnazmT1pZvebWaqmVUbkrLPO0pNPPqm77rpLZ511li644AItW7ZMMzMzuvLKK3X66afrtNNO04033igpMwH08ssv19vf/natW7dOe/bsme1rzZo1s9/8+i//8i9asWKF3vnOd2rt2rXasWOHbrjhBv3FX/yFhoeHdc8992jv3r368Ic/rNNPP12nn366/u3f/k2StG/fPp1zzjk65ZRTdOmll8rd6z8wACIxOTmpVCqltrY2pVIpTU5Ollw/MDAgM5OZKZlMamBg4Jh25RxrfHw87zFK1VbpdjNTe3u7zKxkjWhy7l7WIunTkr4h6bY828Yl3RA8v0jSzaX6W7lypc/36KOPvv7iiivc3/Oe2i5XXHHMMefr7e11d/fXXnvNL7jgAr/uuut8y5Yt3tPT40899ZS7u994443+pS99yd3dDx486CtXrvSnnnrKb731Vl+3bp0fOXLEn332WX/DG97g3/72t93d/T3veY//7Gc/8z179viSJUtm+9q3b5+7u3/uc5/zr3zlK7N1XHzxxX7PPfe4u/vOnTv95JNPdnf3T37yk/6FL3zB3d1vu+02l+R79+4tPpYAmt7ExIT39PS4pNmlp6fHx8bGCq7v6OiYs37+0tPT4xMTE2Udq9i+hWqrZnupGtEcJG31AjmgrI/hmtkSSedL2hgEkfk+KOnzwfNbJF1rZhYcvKkcOHBAw8PDkjJnQD72sY/p3nvv1apVq2Y/1vqDH/xADz300Oz8jldeeUW//OUvdffdd+viiy9WIpHQb/zGb+jss88+pv/77rtPq1evnu3ruOOOy1vHD3/4wzlzRl599VWl02ndfffd+s53viNJOv/88/WmN72pZj87gMa1YcMGvTY9rbMldWZXTk9r1403avXRo3MbB+vXzl8/3/S07vz0p7V+3r9Dd37601pdav5bzr5521e7vUSNqLGzzpL6+up6yHK/B+QvJf0PSckC20+U9IwkufsRM3tFUr+kObMjzWxU0qgkDQ4OljjiX5ZZWm1l54DM19vbO/vc3XXNNdfo3HPPndPm9ttvr1kdR48e1X333afu7u6a9Qmgee3atUvrJf3D/A2FQkap8JG1Z4903nlzVm0ut6hg34Ltq91epEbU2COPSEXmKS6EkgHEzD4gaY+7P2Bma6o5mLtvkrRJkkZGRpru7EjWueeeq+uvv15nn322Ojo69MQTT+jEE0/U6tWrdeONN+qjH/2o9uzZoy1btuj3f//35+x7xhlnaHx8XE8//bSWLl2qF198Uccdd5ySyaReffXV2XbnnHOOrrnmGl155ZWSpO3bt2t4eFirV6/WN77xDf3pn/6p7rjjDr300kt1/dkBRGNwcFCLd2ZuLHq2pP3B+kRbm2byhI1C6+c74c1v1ne/+9056z70oQ/pueefL3vfQu2r3V6sRtRYKlX/Yxa6NuOvz+/4M0m7Je2Q9LykaUkT89r8P0lnBs/blTnzYcX6LTkHJCLZOSC5tmzZ4ueff/7s65mZGb/qqqt8+fLlfsopp/iaNWv85Zdf9qNHj/pll13mv/mbv+nr1q3z97///cfMAXF3v/322314eNhPO+00X7dunbu7P/74437qqaf6O9/5Tr/77rt97969/nu/93t+6qmn+jve8Q7/+Mc/7u7uL7zwgr/3ve/1ZcuW+aWXXuqDg4PMAQFawMTEhH+po8Nd8jbmgKBJqMgckLInoWb60Rrln4R6meZOQv1Wqb4aNYDEBWMJxM8j553n02ZuZj40NDTnP/ChoaG86/v7+2f/M+/r6/P+/v5j2uUzv8+xsbG8xyjUPux2SZ5IJFxSyRrR+IoFEPMK5okGl2D+2N0/YGZfDDr+vpl1K3Np8l2SXpR0kbs/VayvkZERz34sNeuxxx7TO97xjrLrQWGMJRBDY2PSrbdm5kQATcDMHnD3kXzbKroZnbvfJemu4PnVOesPSvpI+BIBACWl03X/pAKwUPgmVABoFgQQxAgBBACaBQEEMUIAAYBmQQBBjBBAAKBZEEAQIwSQHC+//LKuu+66ivfbvHmzfv3rX8++TqVSeuGFF4rsAQAhpNNSstAXUgPNhQCSo1AAOXLkSNH95gcQAFgQnAFBjDR1ACl1e+dKfeYzn9GvfvUrDQ8P6/TTT9dZZ52lCy64QMuWLdOOHTu0fPny2bZf/epX9fnPf1633HKLtm7dqvXr12t4eFgHDhyQJF1zzTVasWKFTj31VP3iF7+oqi4AkEQAQaw0bQCZnJzU6Oiodu7cKXfXzp07NTo6WlUI+fKXv6y3vvWt2r59u77yla9o27Zt+qu/+is98cQTBfe58MILNTIyosnJSW3fvl2LFi2SJA0MDGjbtm0aGxvTV7/61dA1AYAk6cgR6eBBAghio2kDyIYNGzQ97/bN09PT2rBhQ82OsWrVKi1dujTUvr/7u78rSVq5cqV27NhRs5oAtKj9we3nCCCIiaYNILt27apofRi9vb2zz9vb23U0586SBw8eLLpvV1eXJCmRSJScQwIAJaXTmUcCCGKiaQPI4OBgRevLkUwmNTU1lXfb8ccfrz179mjfvn06dOiQbrvttrL2A4CaIIAgZiq6F0wj2bhxo0ZHR+dchunp6dHGjRtD99nf3693v/vdWr58uRYtWqTjjz9+dltHR4euvvpqrVq1SieeeKJOPvnk2W2XXHKJPvGJT2jRokX6yU9+Evr4AFBQ9pccAghioqK74dZSLe6GOzk5qQ0bNmjXrl0aHBzUxo0btX79+lqX2pS4Gy4QM3fdJf32b0s/+lHmEWgCNbsbbqNZv349gQNAa+ASDGKmaeeAAEBLIYAgZgggANAMCCCImYYLIFHNSYkTxhCIIQIIYqahAkh3d7f27dvHf6BVcHft27dP3d3dUZcCoJayASTn+4mAZtZQk1CXLFmi3bt3a+/evVGX0tS6u7u1ZMmSqMsAUEvptNTZmVmAGGioANLR0RH6q88BINbSaSmZjLoKoGYa6hIMAKAA7oSLmCGAAEAzIIAgZgggANAMCCCIGQIIADQDAghihgACAM2AAIKYIYAAQDMggCBmCCAA0AympgggiBUCCAA0A86AIGYIIADQ6GZmpOlpAghihQACAI1uejrzSABBjBBAAKDRcSdcxBABBAAaHQEEMUQAAYBGRwBBDBFAAKDREUAQQwQQAGh02QCSTEZbB1BDBBAAaHScAUEMEUAAoNERQBBDBBAAaHQEEMQQAQQAGh0BBDFEAAGARpdOS+3tUmdn1JUANVMygJhZt5n91MweNLNHzOwLedpcYmZ7zWx7sFy6MOUCQAvK3ojOLOpKgJppL6PNIUlnu3vazDok/djM7nD3++a1u9ndL699iQDQ4qamuPyC2CkZQNzdJQUXINURLL6QRQEAcmTPgAAxUtYcEDNLmNl2SXsk3enu9+dp9mEze8jMbjGzk2pZJAC0NAIIYqisAOLuM+4+LGmJpFVmtnxek3+WlHL30yTdKenr+foxs1Ez22pmW/fu3VtF2QDQQgggiKGKPgXj7i9L2iLpffPW73P3Q8HLr0laWWD/Te4+4u4jixcvDlEuALQgAghiqJxPwSw2szcGzxdJeq+kX8xrc0LOywskPVbDGgGgtRFAEEPlfArmBElfN7OEMoHlW+5+m5l9UdJWd/++pP9uZhdIOiLpRUmXLFTBANByCCCIoXI+BfOQpHflWX91zvOrJF1V29IAAJIIIIglvgkVABqZeyaAJJNRVwLUFAEEABrZgQOZEMIZEMQMAQQAGhk3okNMEUAAoJERQBBTBBAAaGQEEMQUAQQAGhkBBDFFAAGARkYAQUwRQACgkU1NZR4JIIgZAggANDLOgCCmCCAA0MgIIIgpAggANDICCGKKAAIAjSydlsykRYuirgSoKQIIADSy7I3ozKKuBKgpAggANDLuhIuYIoAAQCMjgCCmCCAA0MjSaSmZjLoKoOYIIADQyDgDgpgigABAIyOAIKYIIADQyAggiCkCCAA0MgIIYooAAgCNjACCmCKAAECjcieAILYIIADQqA4dko4cIYAglgggANCouBEdYowAAgCNigCCGCOAAECjIoAgxgggANCoCCCIMQIIADQqAghijAACAI2KAIIYI4AAQKMigCDGCCAA0KiyASSZjLYOYAEQQACgUXEGBDFGAAGARpUNID090dYBLAACCAA0qnRa6u2V2vinGvHDuxoAGhU3okOMEUAAoFERQBBjBBAAaFQEEMQYAQQAGtXUFAEEsUUAAYBGxRkQxBgBBAAaFQEEMUYAAYBGRQBBjJUMIGbWbWY/NbMHzewRM/tCnjZdZnazmT1pZvebWWpBqgUQG+Pj42pvb5eZzVn6+vqUTCaPWV/O0tbWpr6+PpnZbN/Zx4GBAXV1deXdL5FIyMyUSqU0OTlZsLZCxyy0LZlManJycvZnnpycVCqVUltbm1KplMbHxzUwMHDMfu3t7RofHyeAINbM3Ys3MDNJve6eNrMOST+WdIW735fTZlzSae7+CTO7SNLvuPt/LdbvyMiIb926tfqfAEDTGR8f1/XXXx91GXm1t7fryJEjNe1v8+bNkqTR0VFNT0+Xve/htjZ1XHml9OUv16weoJ7M7AF3H8m3rb3Uzp5JKMH3AasjWOanlg9K+nzw/BZJ15qZeal0A6Albdq0ac7r/yJph6RfF9nnLZI+tGAV5ahh+Mj29/QnPylJ+kQF4aNNUsfRo5wBQWyVDCCSZGYJSQ9I+s+S/sbd75/X5ERJz0iSux8xs1ck9Ut6YV4/o5JGJWlwcLC6ygE0rZmZmTmvvyvpG5I+VWSfqyRdumAVLbCXXgq/79vfXrs6gAZSVgBx9xlJw2b2Rkn/ZGbL3f3hSg/m7pskbZIyl2Aq3R9APCQSiTkh5E2S3lhinzdI+oWk0xeurAUzeNJJkqRdzzxT2Y5tbZr6yEcWoCIgemUFkCx3f9nMtkh6n6TcAPKspJMk7TazdmX+rdhXsyoBxMro6OjsHJAuZf4hKnWhISnpFb1+PXihLMQckM/+2Z9JqnwOyNjHP16zOoBGU86nYBYHZz5kZoskvVeZX0RyfV/SR4PnF0r6EfM/ABRy3XXXaWxsTIlEQslgXfaxt7dXfXnmPfSpdPgwM/X29krKnGXJfezv71dnZ2fe/dqCu80ODQ1p8+bNs7WVIzNPP7++vj5t3rxZ69ev1/r167Vp0yYNDQ3JzDQ0NKSxsTH19/cfs18ikdDY2Jiuu+66smoAmlE5n4I5TdLXJSWUCSzfcvcvmtkXJW119++bWbekf5D0LkkvSrrI3Z8q1i+fggEgSXr6aektb5HOPFO6997C7YaHpaEh6Xvfq1tpAKpT7adgHlImWMxff3XO84OSuFAJoHLp9NzHYu34RAgQG3wTKoBoEUCAlkQAARAtAgjQkgggAKJVTgA5elTav58AAsQIAQRAtLLB48ABad4XlM3KfnSVAALEBgEEQLRyz3zs31+8DQEEiA0CCIBo5QaQQpdhsuuTyfzbATQdAgiAaFUSQDgDAsQGAQRAtAggQEsigACIFgEEaEkEEADRIoAALYkAAiBa6bSUvUkcAQRoGQQQANFKp6U3v/n15/lMTWUeCSBAbBBAAERraqp0AOEMCBA7BBAA0SrnDEg6LSUSUldX/eoCsKAIIACilU5Lxx0ndXQUDyB9fZJZfWsDsGAIIACilQ0XfX2vz/Uo1AZAbBBAAEQrN4CUOgMCIDYIIACic/iw9NprBBCgBRFAAEQn99MtBBCgpRBAAESHAAK0LAIIgOhUEkCSyfrVBWDBEUAARIczIEDLIoAAiE42cCSTmYUAArQMAgiA6JRzBsSdAALEEAEEQHTmB5D9+6WjR+e2OXgws44AAsQKAQRAdOYHEHfpwIHCbQDEBgEEQHTmB5DcdVnZr2cngACxQgABEJ1suOjtLRxAOAMCxBIBBEB00mlp0SIpkSCAAC2GAAIgOrmfbiGAAC2FAAIgOgQQoGURQABEhwACtCwCCIDo5Asg2YmpuW1ytwOIBQIIgOhwBgRoWQQQANEpN4CYST099a0NwIIigACITm4A6e6W2tryB5C+vkwIARAbBBAA0ckNIGb5b0jHjeiAWCKAAIhOOi0lk6+/TiYJIECLIIAAiMaRI5k73eaGC86AAC2DAAIgGvv3Zx4JIEBLKhlAzOwkM9tiZo+a2SNmdkWeNmvM7BUz2x4sVy9MuQBiI9/Ha/MFkKkpAggQQ+1ltDki6Y/cfZuZJSU9YGZ3uvuj89rd4+4fqH2JAGKpUAB5/vlj2y1dWr+6ANRFyTMg7v6cu28Lnk9JekzSiQtdGICYy37jKZdggJZU0RwQM0tJepek+/NsPtPMHjSzO8zslFoUByDGyr0EQwABYqmcSzCSJDPrk3SrpE+5+6vzNm+TNOTuaTM7T9J3Jb0tTx+jkkYlaXBwMGzNAOKgnADiTgABYqqsMyBm1qFM+Jh09+/M3+7ur7p7Onh+u6QOMxvI026Tu4+4+8jixYurLB1AUysWQNwzrw8fznxclwACxE45n4IxSX8n6TF3//MCbd4ctJOZrQr63VfLQgHETKEAMjMjHTpUuA2AWCjnEsy7Jf2BpJ+b2fZg3WclDUqSu98g6UJJY2Z2RNIBSRe5Z3+FAYA8CgWQ7LbubgIIEGMlA4i7/1hS0btAufu1kq6tVVEAWkA2XPT2vr4uN4AMDBBAgBjjm1ABRCOdljo7M0tWbgDJfSSAALFDAAEQjXyfbsm+zn5HSDaA5N6wDkAsEEAARGP+nXCl119zBgSIPQIIgGgUOwNCAAFijwACIBoEEKClEUAARIMAArQ0AgiAaJQTQLKTUXt66lcXgLoggACIRr4AsmiRZDb3DEhPj5RI1L8+AAuKAAIgGlNTxwaQtrbMF5PlBhAuvwCxRAABEI1C4SL3jrgEECC2CCAA6m9mRpqeJoAALYwAAqD+pqczjwQQoGURQADUX7GP1xJAgJZAAAFQfwQQoOURQADUHwEEaHkEEAD1RwABWh4BBED9VRJA5t8xF0AsEEAA1F82YOQLF8lkZvtrr0mHDnEGBIgpAgiA+it1BuTwYemllwq3AdD0CCAA6q9UAJGk558v3AZA0yOAAKi/cgLIc88VbgOg6RFAANRfOi21t0udncdu4wwI0BIIIADqL/vxWrNjtxFAgJZAAAFQf1NThYMFAQRoCQQQAPVX7AvGCCBASyCAAKg/AgjQ8gggAOqPAAK0PAIIgPqrJID09tanJgB1RQABUH/FAkg2cLz8stTVJXV01K0sAPVDAAFQf8UCSCIhLVqUec7lFyC2CCAA6q9YAJFe30YAAWKLAAKgvtwzASTfnXCzstuKtQHQ1AggAOrrwIFMCOEMCNDSCCAA6qvYjeiyCCBA7BFAANQXAQSACCAA6o0AAkAEEAD1RgABIAIIgHojgAAQAQRAvU1NZR4JIEBLI4AAqC/OgAAQAQRAvRFAAKiMAGJmJ5nZFjN71MweMbMr8rQxM/trM3vSzB4ysxULUy7QWCYnJ5VKpWRmoZe2traq9o9i6e7uVltbm/r6+kr+bH19fWpra1NXV5fMTH9y+eWSpJ7jj1cqldLk5OScMR0fH9fH/+iPJEl/MDamdevWKZVKqa2t7Zj22fHPt62Rlaq7WX8uoCLuXnSRdIKkFcHzpKQnJC2b1+Y8SXdIMklnSLq/VL8rV650oJlNTEx4T0+PS2KpYPmi5DM5r3t6enxiYsLd3cfGxlySX5T5rlS/IM/+2fb5xj+3r0ZVqu5m/bmAfCRt9QI5wDLby2dm35N0rbvfmbPuRkl3ufs3g9ePS1rj7s8V6mdkZMS3bt1a0bGBRpJKpbRz505JmeT9PyUdH2lFzWGVpCFJb8hZNzQ0pB07dqi9vV0zMzP6gKR/lrRW0o/y9DE0NCRJs+M/f9uOHTtqXXbN5L5vcmXrLrUdaCZm9oC7j+Tb1l5hRylJ75J0/7xNJ0p6Juf17mDdnABiZqOSRiVpcHCwkkMDDWfXrl2zz1OSPivpRUn7I6qnmfzfea+zYzkzMyNJ2ibpx5J+XmD/3LGvZFsjKFRfdn2p7UBclB1AzKxP0q2SPuXur4Y5mLtvkrRJypwBCdMH0CgGBwdnf1PNTpW8VNI/RVZR88r+QpJIJDQzM6NfSzqrjPb5zhQ0+i83ue+b+evL2Q7ERVmfgjGzDmXCx6S7fydPk2clnZTzekmwDoitjRs3qqenR9LrASQdXTlNq6enRxs3bpQkjY6Olt0+d/zz9dWoStXdrD8XULFCk0P89QmmJunvJf1lkTbna+4k1J+W6pdJqIiDiYkJHxoa8nOCSZNnhpiUaWaRTwytdOnq6nIz897e3pI/W29vr5uZd3Z2HrN9aGjomMmVY2NjnkgkXJInEglfu3atDw0NuZkd0z47/vm2NbJSdTfrzwXMp2omoZrZb0m6R5nLsUeD1Z+VNBgEmBvMzCRdK+l9kqYl/aG7F51hyiRUxMqtt0oXXig9+KB02mlRVwMADaGqSaju/mNlzmwUa+OSLgtXHhAD5Xy5FgBgFt+ECtQCAQQAKkIAAWqBAAIAFSGAALWQTktm0qJFUVcCAE2BAALUQjqdOfthRadLAQACBBCgFrIBBABQFgIIUAvptJRMRl0FADQNAghQC5wBAYCKEECAWiCAAEBFCCBALRBAAKAiBBCgFgggAFARAghQCwQQAKgIAQSoBQIIAFSEAAJUy12amiKAAEAFCCBAtQ4dkmZmCCAAUAECCFAtbkQHABUjgADVIoAAQMUIIEC1CCAAUDECCFAtAggAVIwAAlSLAAIAFSOAANUigABAxQggQLWyASSZjLYOAGgiBBCgWpwBAYCKEUCAahFAAKBiBBCgWtkA0tMTbR0A0EQIIEC10mmpt1dq468TAJSLfzGBanEnXACoGAEEqBYBBAAqRgABqjU1RQABgAoRQIBqcQYEACpGAAGqRQABgIoRQIBqEUAAoGIEEKBaBBAAqBgBBKgWAQQAKkYAAapFAAGAihFAgGocPpxZCCAAUBECCFCN/fszj8lktHUAQJMhgADV4E64ABAKAQSoBgEEAEIhgADVIIAAQCgEEKAaBBAACKVkADGzm8xsj5k9XGD7GjN7xcy2B8vVtS8TaFAEEAAIpb2MNpslXSvp74u0ucfdP1CTioBmQgABgFBKngFx97slvViHWoDmMzWVeSSAAEBFajUH5Ewze9DM7jCzU2rUJ9D4OAMCAKGUcwmmlG2Shtw9bWbnSfqupLfla2hmo5JGJWlwcLAGhwYilg0gvb3R1gEATabqMyDu/qq7p4Pnt0vqMLOBAm03ufuIu48sXry42kMD0Uunpe5uqb0WWR4AWkfVAcTM3mxmFjxfFfS5r9p+gabAjegAIJSSv7aZ2TclrZE0YGa7JX1OUockufsNki6UNGZmRyQdkHSRu/uCVQw0EgIIAIRSMoC4+8Ultl+rzMd0gdZDAAGAUPgmVKAaBBAACIUAAlQjnZaSyairAICmQwABqsEZEAAIhQACVIMAAgChEECAahBAACAUAghQDQIIAIRCAAHCmpmRDhwggABACAQQIKz9+zOPBBAAqBgBBAhrairzSAABgIoRQICwsnfCJYAAQMUIIEBYBBAACI0AAoRFAAGA0AggQFgEEAAIjQAChEUAAYDQCCBAWAQQAAiNAAKERQABgNAIIEBYBBAACI0AAoSVTkudnZkFAFARAggQFjeiA4DQCCBAWAQQAAiNAAKERQABgNAIIEBYBBAACI0AAoRFAAGA0AggQFhTUwQQAAiJAAKExRkQAAiNAAKERQABgNAIIEBYBBAACI0AAoRx9Ki0fz8BBABCIoAAYUxPZx4JIAAQCgEECIMb0QFAVQggQBgEEACoCgEECIMAAgBVIYAAYWQDSDIZbR0A0KQIIEAYnAEBgKoQQIAwCCAAUBUCCBAGAQQAqkIAAcIggABAVQggQBgEEACoCgEECGNqSkokpK6uqCsBgKZEAAHCyN6IzizqSgCgKRFAgDC4Ey4AVKVkADGzm8xsj5k9XGC7mdlfm9mTZvaQma2ofZn1sW7dOpnZ7LJu3bqy9pucnNTAwMDsfslkUgMDA2pra1MqldLk5KQmJyeVSqXmrCvVZ2778fHxOcdoa2ubU2tfX98xxyzVd75+EomEzCxv3QMDA+rr65tt293dPWf/7PP29naNj4/nHZeurq45x0smk3Ne51uy/c6vtdDS3d2t8fHx2Z8xd2lvbz/mZy22FGpz80036RfPPjtnrAAAFXD3oouk1ZJWSHq4wPbzJN0hySSdIen+Un26u1auXOmNZO3atS7pmGXt2rVF95uYmPCOjo68+2aXzs7OY9r09PT4xMREwT57enqK9llqKdR/JX3nq5sls9wm+c/K/PMEgFYlaasXyAGW2V6cmaUk3ebuy/Nsu1HSXe7+zeD145LWuPtzxfocGRnxrVu3ljx2xW66Sfrbv614t5/cd1/BbWeecUbBbdu2bdOhw4crPp4kdXV2asWKY08YVdNnqf5r1XerWybp3yX9ds66oaEh7dixI5qCAKABmdkD7j6Sb1t7Dfo/UdIzOa93B+uOCSBmNippVJIGBwdrcOg8Ojul//SfKt7t1WIbi/S3t5r/zA8fztt3VX2W6L9mfbe4+yTNv+iya9euKEoBgKZUizMgt0n6srv/OHj9r5L+xN2Lnt5YsDMgIVmRTzMUG6NUKqWdO3eGOmah35ir6bNU/7XqG8fiDAgAzFXsDEgtPgXzrKSTcl4vCdY1lbVr11a0Pmvjxo3q6Ogo2qazs/OYNj09Pdq4cWPBPnt6eor2WUqh/ivpO1/dyK/YnycAII9Ck0N87kTTlApPQj1fcyeh/rScPhttEqr7sRNRS01AzZqYmPD+/v7Z/fr6+ry/v9/NzIeGhnxiYsInJiZ8aGhozrpSfea2Hxsbm3MMM5tTa29v7zHHLNV3vn7a2tpcUt66+/v7vbe3d7ZtV1fXnP2zzxOJhI+NjeUdl87OzjnH6+vrKznhM9vv/FoLLV1dXT42Njb7M+YuiUTimJ+12FJOm3L+PAGgFamaSahm9k1JayQNSPoPSZ+T1BGElxssc+3iWknvkzQt6Q+9xOUXqfEuwQAAgNqqahKqu19cYrtLuixkbQAAoAXxTagAAKDuCCAAAKDuCCAAAKDuCCAAAKDuCCAAAKDuCCAAAKDuCCAAAKDuCCAAAKDuCCAAAKDuyrob7oIc2GyvpIW6LeuApBcWqO84Y9zCYdzCYdwqx5iFw7iFU4txG3L3xfk2RBZAFpKZbS303fMojHELh3ELh3GrHGMWDuMWzkKPG5dgAABA3RFAAABA3cU1gGyKuoAmxbiFw7iFw7hVjjELh3ELZ0HHLZZzQAAAQGOL6xkQAADQwGIXQMzsfWb2uJk9aWafibqeRmZmO8zs52a23cy2BuuOM7M7zeyXweOboq4zamZ2k5ntMbOHc9blHSfL+Ovg/feQma2IrvLoFBizz5vZs8H7bbuZnZez7apgzB43s3OjqTp6ZnaSmW0xs0fN7BEzuyJYz/utiCLjxnuuADPrNrOfmtmDwZh9IVi/1MzuD8bmZjPrDNZ3Ba+fDLanqi7C3WOzSEpI+pWkt0jqlPSgpGVR19Woi6Qdkgbmrfvfkj4TPP+MpP8VdZ1RL5JWS1oh6eFS4yTpPEl3SDJJZ0i6P+r6G2jMPi/pj/O0XRb8Xe2StDT4O5yI+meIaNxOkLQieJ6U9EQwPrzfwo0b77nCY2aS+oLnHZLuD95D35J0UbD+BkljwfNxSTcEzy+SdHO1NcTtDMgqSU+6+1PufljSP0r6YMQ1NZsPSvp68Pzrkj4UXSmNwd3vlvTivNWFxumDkv7eM+6T9EYzO6EuhTaQAmNWyAcl/aO7H3L3pyU9qczf5Zbj7s+5+7bg+ZSkxySdKN5vRRUZt0Ja/j0XvGfSwcuOYHFJZ0u6JVg//72WfQ/eImmtmVk1NcQtgJwo6Zmc17tV/E3Y6lzSD8zsATMbDdYd7+7PBc+fl3R8NKU1vELjxHuwuMuDSwU35VzeY8zyCE5xv0uZ30x5v5Vp3rhJvOcKMrOEmW2XtEfSncqcCXrZ3Y8ETXLHZXbMgu2vSOqv5vhxCyCozG+5+wpJ75d0mZmtzt3omXNtfEyqBMapbNdLequkYUnPSfo/kVbTwMysT9Ktkj7l7q/mbuP9VlieceM9V4S7z7j7sKQlypwBOrmex49bAHlW0kk5r5cE65CHuz8bPO6R9E/KvAH/I3sKN3jcE12FDa3QOPEeLMDd/yP4B++opL/V66e8GbMcZtahzH+ik+7+nWA177cS8o0b77nyuPvLkrZIOlOZy3jtwabccZkds2D7GyTtq+a4cQsgP5P0tmAWb6cyE2W+H3FNDcnMes0smX0u6RxJDyszXh8Nmn1U0veiqbDhFRqn70v6b8GnE86Q9ErOqfOWNm9uwu8o836TMmN2UTDLfqmkt0n6ab3rawTBNfW/k/SYu/95zibeb0UUGjfec4WZ2WIze2PwfJGk9yozd2aLpAuDZvPfa9n34IWSfhScjQsv6pm4tV6UmRX+hDLXsjZEXU+jLsp8UujBYHkkO1bKXNP7V0m/lPRDScdFXWvUi6RvKnP69jVlrol+rNA4KTOz/G+C99/PJY1EXX8Djdk/BGPyUPCP2Qk57TcEY/a4pPdHXX+E4/ZbylxeeUjS9mA5j/db6HHjPVd4zE6T9O/B2Dws6epg/VuUCWNPSvq2pK5gfXfw+slg+1uqrYFvQgUAAHUXt0swAACgCRBAAABA3RFAAABA3RFAAABA3RFAAABA3RFAAABA3RFAAABA3RFAAABA3f1/VIbekX5fgmwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 1440x432 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "predictions = trained_model.predict(input_test)\n",
    "plt.rcParams[\"figure.figsize\"] = (20,6)\n",
    "\n",
    "holder_index = 2\n",
    "testing = pd.DataFrame({'amount': np.linspace(0, input_test['amount'].max(), 100), 'holder_index': holder_index})\n",
    "sub_test = df[df['holder_index']==holder_index]\n",
    "\n",
    "DT_predictions = trained_model.predict(testing).astype(int)\n",
    "plt.subplot(121)\n",
    "plt.plot(testing['amount'], DT_predictions, color='red', label='Predicted')\n",
    "plt.scatter(sub_test['amount'], sub_test['dispute_risk'], color='black', label='truth')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scoring and visualization are just simple ways to test the trained model. They are usually not enough for real world use cases.\n",
    "\n",
    "There are advanced way to analyze how the model is performing (i.e. ROC) and there are many other aspects to consider: fairness, explanability, interpretability.\n",
    "\n",
    "Additional resources:\n",
    "- https://en.wikipedia.org/wiki/Receiver_operating_characteristic\n",
    "- https://christophm.github.io/interpretable-ml-book/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 5 - Save the model as PMML\n",
    "When your are happy with your model you can export it as PMML."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "skl_to_pmml(trained_model, ['amount', 'holder_index'], 'dispute_risk',\n",
    "                \"dtree_risk_predictor.pmml\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
